Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

نویسندگان

چکیده

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given fundamental difficulty calculating a Nash (NE), we instead aim at finding coarse correlated (CCE), solution concept that generalizes NE by allowing possible correlations among agents’ strategies. We propose algorithm which each agent independently runs optimistic V-learning (a variant Q-learning) to explore unknown environment, while using stabilized online mirror descent (OMD) subroutine for policy updates. show agents can find $$\epsilon $$ -approximate CCE most $$\widetilde{O}( H^6S A /\epsilon ^2)$$ episodes, where S is number states, size largest individual action space, and H length episode. appears be first sample complexity result generic games. Our results rely on novel investigation anytime high-probability regret bound OMD with dynamic rate weighted regret, would independent interest. One key feature our it decentralized, sense has access only its local information, completely oblivious presence others. way, readily scale up arbitrary agents, without suffering from exponential dependence agents.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Taking turns in general sum Markov games

This paper provides a novel approach to multi-agent coordination in general sum Markov games. Contrary to what is common in multi-agent learning, our approach does not focus on reaching a particular equilibrium between agent policies. Instead, it learns a basis set of special joint agent policies, over which it can randomize to build different solutions. The main idea is to tackle a Markov game...

متن کامل

Hierarchical Multiagent Reinforcement Learning in Markov Games

Interactions between intelligent agents in multiagent systems can be modeled and analyzed by using game theory. The agents select actions that maximize their utility function so that they also take into account the behavior of the other agents in the system. Each agent should therefore utilize some model of the other agents. In this paper, the focus is on the situation which has a temporal stru...

متن کامل

Value-function reinforcement learning in Markov games

Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason ab...

متن کامل

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q...

متن کامل

Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games

Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Dynamic Games and Applications

سال: 2022

ISSN: ['2153-0793', '2153-0785']

DOI: https://doi.org/10.1007/s13235-021-00420-0